Query processor, query processor elements and a method of establishing such a query processor and query processor elements and a domain processor

ABSTRACT

A domain processor includes at least one robot modeler, at least one domain modeler, and at least one Query Processor Modeler. The robot modeler includes means for modeling at least one computer-based robot, the robot adapted for accessing at least one web-based data source including entities in a predefined domain. The domain modeler includes means for modeling at least one domain model and means for establishing at least one extraction model and at least one storage model. The Query Processor Modeler includes means for selecting at least two Query Processor elements from a set of predefined query processor elements, means for combining the selected Query Processor elements, and means for executing the associated query processor elements on at least one computer system, at least one of the query processor elements of the associated query processor elements being a Robot query processor Element adapted for accessing the web-based data source.

TECHNICAL FIELD OF INVENTION

The invention relates to a query processor, query processor elements anda method of establishing such a query processor and query processorelements and a domain processor.

DESCRIPTION OF RELATED ART

The invention deals with accessing, i.e. reading and/or writing in datasources associated with a certain domain. The data sources are typicallyweb-based which basically means that the data of the data source aremade available to the user according to a serial transfer protocol, e.g.http via the Internet. The serial transfer of the data made available tothe user is sometimes easily conceivable to a user, especially whendealing with a simple and quite specific request. A problem with dataretrieval from web-based data sources is that the user must typicallyfind one or several data sources comprising the relevant data. Thissearch may be very time consuming and typically non-exhaustive due tothe fact that several data sources may easily be overlooked. Moreover,the user has to perform further queries on each site and these queriestypically have to be made different from site to site.

This problem has been dealt with in the prior art by applying robots andagents with the purpose of collecting information within a certaindomain of interest and by providing these domain data or an extractionof the data to a user in a more straightforward searchable way.

A problem with the known systems applying agents is that the agentsrequire some kind of knowledge about the data source structure, and theuse of the agent requires the accept of the owner of the data source dueto the fact that an agent may dig into a data source more or less out ofcontrol.

Another problem with the known systems applying robots is also that therobots require some kind of knowledge about the data source structure,e.g. knowledge of the structure of data containing an HTML table of aweb-based data source, and if this knowledge is not available, theprogramming of such robot is quite difficult. Hence, the applicablenumber of robots retrieving data from such data sources is limited as isthe data of interest in the domain.

BRIEF SUMMARY OF THE INVENTION

The invention relates to domain processor (DP) comprising

-   -   at least one robot modeler (RM)    -   at least one domain modeler (DMR),    -   at least one Query Processor Modeler (QPM)        said robot modeler (RM) comprising    -   means for modeling at least one computer-based robot (R),    -   said at least one robot (R) being adapted for accessing at least        one web-based data source (DS),    -   said at least one data source (DS) comprising entities comprised        in a predefined domain (D),        said at least one domain modeler (DMR) comprising    -   means for modeling at least one domain model (DM) associated        with at least one chosen domain, said domain model (DM)        comprising at least one extraction model (EM) and at least one        storage model (STM),    -   means for establishing at least one extraction model (EM)        associated with a chosen domain,    -   means for establishing at least one storage model (STM)        associated with said chosen domain,        said at least one Query Processor Modeler (QPM) comprising    -   means for selecting at least two Query Processor elements (QPE)        from a set of predefined query processor elements (QPE),    -   means for combining at least two of the selected Query Processor        elements (QPE),    -   means for executing said associated query processor elements on        at least one computer system (CS),    -   at least one of said query processor elements (QPE) of        associated query processor elements being a Robot query        processor Element (RQPE) adapted for accessing at least one        web-based data source (DS).

When the domain processor (DP) comprises at least one query processormaintenance manager (QMM), said at least one query processor maintenancemanager (QMM) comprising means for executing at least one queryprocessor (QP) established by the domain processor, an advantageousembodiment has been obtained.

According to the invention, the domain processor may advantageouslycomprise a tool for running a query processor established by the domainprocessor. The query processor maintenance manager may thus be adaptedfor running the query processor on one or several servers.

Such a manager may include a visual tool illustrating the running stateof the query processor and the individual elements. An example of suchintuitive processing is that the individual elements change coloraccording to their state, e.g. within a color range from white to red,depending on the load of the elements.

Moreover, the manager should preferably illustrate basic on-offconditions visually, i.e. illustrate actively if an element is workingproperly, and whether entities are transferred between the queryprocessor elements and whether entities may actually be transferredbetween elements. The latter feature may ease operation of the systemsignificantly due to the fact that the absence of an entity flow betweenthe elements does not necessarily indicate that a fault-condition hasoccurred simply because the element is not queried.

Determination of a “clear road” between the elements may e.g. beestablished by forwarding dummy (testing) queries between elements atcertain intervals.

Moreover, the Query Processor Modeler may include submenus facilitatingspecialized execution of the query processor.

Moreover, the invention relates to a robot modeler (RM) comprising

means for modeling at least one computer-based robot (R),

said at least one robot (R) being adapted for accessing at least oneweb-based data source (DS),

said at least one data source (DS) comprising entities comprised in apredefined domain (D).

Moreover, the invention relates to a domain modeler (DMR) comprising

means for modeling at least one domain model (DM) associated with atleast one chosen domain, said domain model (DM) comprising at least oneextraction model (EM) and at least one storage model (STM),

means for establishing at least one extraction model (EM) associatedwith a chosen domain,

means for establishing at least one storage model (STM) associated withsaid chosen domain.

Thus, a domain model represents a structured way of defining propertiesof different aspects of a domain.

A domain model may e.g. comprise an extraction model, i.e. a definitionof relevant entities and attributes to be looked for in the web-baseddata source. It should be noted that the extraction model may primarilydescribe (or mask) the data source on the basis of text strings andcombinations of such strings.

A chosen domain may e.g. be “cars offered for sale”.

When the domain modeler comprises means for establishing referencemapping between extracted data obtained according to said extractionmodel (EM) and a conceptual representation of said data, a furtheradvantageous embodiment of the invention has been obtained.

When said reference mapping defines a set of reference entitiesdescribing a number of entities (E), said entities having attributes, afurther advantageous embodiment of the invention has been obtained.

A set of reference entities may e.g. be a product catalogue.

Reference mapping may facilitate the possibility of adding knowledge tothe retrieved entities. Such information may e.g. be informationdeducible from a reference product catalogue. Thus, if an entity ismatched to an entity type of the product catalogue, the entity may bemodified, e.g. as a validation, corrected or inserted as additionalinformation about the entity.

A correction may e.g. be that one of the attributes of the Porscheretrieved above is false according to the product catalogue. This falseattribute may be detected in several different ways within the scope ofthe invention. The reference product catalogue may e.g. initially revealthat no Porsche having a 3.0 liter engine has been made with dieselengine. Moreover, the product catalogue may reveal that no Porsche hasbeen made with a diesel engine, thereby raising the probability that thedata source provider has made a mistake. The wrong attribute “Diesel”may then be corrected.

Furthermore, the reference entities may be applied for differentvariants of classification and validation.

When the domain modeler (DMR) comprises means for establishing at leastone language domain dictionary (LDD), a further advantageous embodimentof the invention has been obtained.

When said at least one language domain dictionary (LDD) maps thelanguage of the extracted entities into the general language of thequery processor (QP), a further advantageous embodiment of the inventionhas been obtained.

The general language of the query processor may e.g. be regarded as the“language” defined by an object-oriented conceptual model associatedwith the query processor. Such language may e.g. be a preferred languageor coding chosen as the general language. Hence, the language domaindictionary may e.g. make it possible to have an entity that reads read“wagen” or “bil” transformed into an instance of an object “car”.

When, said domain modeler (DMR) comprises means for establishing a setof reference recognition patterns, a further advantageous embodiment ofthe invention has been obtained.

The set of reference recognition patterns may e.g. comprise characterpatterns (also known as regular expressions) or character structures(even pictures) to be applied when identifying attributes and entities,e.g. Ltd., Corp or A/S indicating that a company attribute or entity isassociated with the character pattern in English, American English andDanish, respectively.

Evidently, such reference patterns will typically be domain specific orat least language specific.

Moreover, the invention relates to a query processor modeler (QPM)comprising

means for selecting at least two Query Processor elements (QPE) from aset of predefined query processor elements (QPE),

means for combining at least two of the selected Query Processorelements (QPE),

means for executing said associated query processor elements on at leastone computer system (CS),

at least one of said query processor elements (QPE) of the associatedquery processor elements being a Robot query processor Element (RQPE)adapted for accessing at least one web-based data source (DS).

According to the invention, a domain-accessing system may be establishedby means of general components. Moreover, the components may rely ongeneral knowledge about the domain of interest, thereby facilitatingvery fast establishment of domain-accessing systems.

When the Query Processor Modeler comprises a graphical user interface(GUI) in the form of a visual programming tool, a further advantageousembodiment of the invention has been obtained.

When said set of query processor elements (QPE) comprises at least twodifferent types of query processor elements

at least one type being a robot query processor element (RQPE) and atleast one type being a trigger query processor element (TQPE), a furtheradvantageous embodiment of the invention has been obtained.

Moreover, the invention relates to a query processor maintenance manager(QMM) comprising

means for executing at least one query processor (QP) established by thedomain processor.

According to the invention, the query processor maintenance managershould be adapted for controlling the processing of an established queryprocessor.

When said maintenance manager (QMM) comprises means for monitoring thestate of at least one query processor element (QPE) or the performanceof at least one query processor element (QPE), a further advantageousembodiment of the invention has been obtained.

When said domain processor maintenance manager (QMM) comprises means forevaluating the data flow between query processor elements (QPE) of aquery processor path, a further advantageous embodiment of the inventionhas been obtained.

When said domain processor maintenance manager (QMM) comprises means forrunning and visual monitoring of the individual modules of a queryprocessor, a further advantageous embodiment of the invention has beenobtained.

When said domain processor maintenance manager (QMM) comprises means forrunning and visual monitoring of a query processor (QP) on elementbasis, a further advantageous embodiment of the invention has beenobtained.

According to the invention, the elements may be advantageously monitoredas visually separated elements.

Moreover, the invention relates to a web-robot,

said robot comprising means for extracting information from web-baseddata sources (DS) in dependency of at least one extraction model (EM),said at least one extraction model comprising reference data structuresdefining entities and/or entity structures of data sources in a domain.

When said robot comprises at least one exchangeable plug-in, saidplug-in comprising retrieving routines adapted for reading knowledgestored in said extraction model, said knowledge preferably beingdomain-specific, a further advantageous embodiment of the invention hasbeen obtained.

When said plug-in defines reference mapping between extracted dataobtained according to said extraction model (EM) and conceptualrepresentation of said data, a further advantageous embodiment of theinvention has been obtained.

When said extraction model (EM) is shared between at least two robots, afurther advantageous embodiment of the invention has been obtained.

Moreover, the invention relates to a query processor (QP),

said query processor (QP) comprising a set of web-based data sources(DS), wherein at least two of said data sources (DS) comprise entitiesaccording to a domain model (DM),

said query processor (QP) comprising at least three query processorelements (QPE),

at least two of said query processor elements (QPE) comprising

a robot (RQPE)

said robot (RQPE) being attached to at least one data source (DS)

said robot comprising means for accessing information from the at leastone data source (DS) according to at least one extraction model (EM)associated with said robot (RQPE),

at least one of said query processor elements (QPE) comprising

a trigger (TQPE)

said trigger query processor element (TQPE) comprising means forestablishing a query.

The web-based data sources are typically independent.

The trigger element may be both manually and automatically driven, i.e.by a query user or an automated query routine.

When, at least one of the query processor elements (QPE) comprises atransformer query processor element (TAQPE), a messenger query processorelement (MESQPE) or a mediator query processor element (MQPE), a furtheradvantageous embodiment of the invention has been obtained.

Moreover, the invention relates to a method of establishing at least onequery processor (QP),

said query processor (QP) comprising a set of web-based data sources(DS), wherein at least two of said data sources (DS) comprise entitiesaccording to a domain model (DM),

said query processor (QP) comprising at least three query processorelements (QPE),

at least two of said query processor elements (QPE) comprising

a robot (RQPE),

said robot comprising means for accessing information from the at leastone data source (DS) according to at least one extraction model (EM)associated with said robot (RQPE),

at least one of said query processor elements (QPE) comprising

a trigger (TQPE),

said trigger query processor element (TQPE) comprising means forestablishing a query.

said method comprising the step of

-   -   attaching at least one selected robot query processor element        (RQPE) to at least one of the data sources (DS) of the domain,    -   combining the selected query processor elements into a query        processor (QP) by means of a graphical user interface (GUI).

It should be noted that the data source may both be regarded as aninternal part or an external part of the query processor within thescope of the invention, depending on whether the associated data sourceis defined by its data or not.

When said graphical user interface (GUI) defines a query processorelement path visually on a drag- and drop basis, a further advantageousembodiment of the invention has been obtained.

When at least one of the combined query processor elements (QPE)comprises a transformer query processor element (TAQPE), a messengerquery processor element (MESQPE) or a mediator query processor element(MQPE), a further advantageous embodiment of the invention has beenobtained.

Moreover, the invention relates to a method of establishing at least onequery processor (QP),

said query processor comprising means for accessing data from web-baseddata sources (DS) of a domain by means at least one user interface (UI)

said method comprising the steps of

selecting a number of query processor element (QPE)

at least one of said selected query processor elements (QPE) being arobot query processor element (RQPE),

at least one of said selected query processor elements (QPE) being atrigger query processor element (TQPE),

attaching at least one selected robot query processor element (RQPE) toat least one of the data sources (DS) of the domain,

combining the selected query processor elements into at least one querypath defining the data flow in the query processor (QP) between the userinterface (UI) and the web-based data sources of the domain, said methodcomprising a further step of

customizing the at least one individual robot query processor element(RQPE) to the corresponding attached data sources (DS),

customizing at least one of the trigger query processor elements (TRPE)to the query processor (QP).

When, at least one of the combined query processor elements (QPE)comprises a transformer query processor element (TAQPE), a messengerquery processor element (MESQPE) or a mediator query processor element(MQPE), a further advantageous embodiment of the invention has beenobtained.

Moreover, the invention relates to a method of extracting data from aweb-based data source (DS), said method comprising the steps of

-   -   identifying and reading attributes and entities of a web-based        data source,    -   converting the read entities into instances of conceptual        entities,    -   verifying whether the read instances correspond with an entity        reference base, (ERB).

According to the above-mentioned embodiment of the invention, veryadvantageous entity processing has been obtained.

A conceptual model may also include a storage database model.

When, the method comprises at least one step of verifying whether theread instances correspond with an entity reference base, (ERB) on thebasis of entities represented in said conceptual entity-representingformat, a further advantageous embodiment of the invention has beenobtained.

According to the invention, very advantageous processing of entities hasbeen obtained. Hence, a conceptual check of the data may be performed oncompact represented data, thereby reducing processing significantly.Hence, according to the invention, the micro-interpretation of the readentities and attributes is made separately, and prior tomacro-interpretation of the entities.

Micro-interpretation according to the invention may be regarded as thereading of individual string-based attributes on a web-based datasource. According to the preferred embodiment of the invention, thecombination of read string-based attributes into entities may also beregarded as micro-interpretation preformed according the extractionmodel.

An example of micro-interpretation work is e.g. the job (typicallyperformed automatically by software-based routines) of determiningwhether a read attribute is a “Ford” or a “Fiat”. A further example isthe determination of whether an engine is a 75 or 155 Hp engine.

Entities held in an extraction format are typically string-based, e.g.Fiat, “Fiat”, FIAT, FIATH, etc.

Entities held in an conceptual format are typically held in anobject-like format. Hence Fiat, “Fiat”, FIAT, FIATH are all representedas a Fiat-type in the conceptual format. Such a Fiat type may typicallyinvolve an integer representation of a Fiat in old databases whereas newdatabases may represent Fiat, “Fiat”, FIAT, FIATH as a “Fiat”.

Macro-interpretation according to the invention may typically beregarded as a syntax check performed on the basis of the complete andestablished instance. Such a check may e.g. be performed with thepurpose of verifying whether the established instance of an entity isactually realistic, i.e. consistent.

Moreover, the conceptually held entities may easily be grouped andfiltered and conceptual checks may evidently be performed relativelyeasily.

Conceptual representation of the entities according to the invention istypically a object-oriented representation.

An example of macro-interpretation work is e.g. the job (typicallyperformed automatically by software-based routines) of determiningwhether read attributes combined into an entity “Fiat”, “120 Hp” and 2.0liter engine are actually valid. Such a check performed on the basis ofa reference base of known (valid) entity types, i.e. a productcatalogue, may moreover be performed with the purpose of addinginformation to the checked instances of entities. Such procedure may beregarded as a deduction of information exemplified by an instance of acar, “Fiat”, “155 Hp” and 2.0 liter. When compared with a referenceproduct catalogue associated with the car domain, such a car may bededuced to be a turbo version, i.e. “Fiat”, “155 Hp”, “2.0 liter” andTURBO.

According to the invention, macro-interpretation may be performed oninstances held in a conceptual format.

When modifying the verified instances according to the entity referencebase (ERB) by adding information associated with said instancescorresponding to said entity reference base, a further advantageousembodiment of the invention has been obtained.

Hence, information may be added to the instances, e.g. by adding furtherattributes, or maybe modifying one or several attributes forming theinstance of an entity slightly.

An example may e.g. be the above-mentioned deduction of informationexemplified by an instance of a car, “Fiat”, “155 Hp” and 2.0 liter.When compared with a reference product catalogue associated with the cardomain, such a car may be deduced to be a turbo version, i.e. “Fiat”,“155 Hp”, “2.0 liter” and TURBO.

A storage model may typically be relational.

When correcting of the verified instances according to the entityreference base (ERB) by correcting information associated with saidinstances corresponding to said entity reference base, a furtheradvantageous embodiment of the invention has been obtained.

Hence, instances may be corrected, e.g. by omitting attributes held inthe instance or maybe modified by one ore several attributes forming theinstance of an entity.

An example may e.g. be the above-mentioned deduction of informationexemplified by an instance of a car, “Fiat”, “120 Hp”, 2.0 liter engineand Turbo. When compared with a reference product catalogue associatedwith the car domain, the verification of the instance may result in acorrection of the “Turbo” attribute, as the verification procedure mayboth conclude (a): no 120 HP Fiat having Turbo is in the referencecatalogue (b): a 120 HP Fiat without Turbo is most likely the trueintended instance of a car. Consequently, a correction routine maycorrect the instance accordingly or discard the entity entirely.

Moreover, the invention relates to a method of establishing a queryprocessor,

said query processor being adapted for accessing data on at least twodifferent web-based data sources,

selecting at least two predefined query processor elements (QPE),

combining the selected query processor elements into a desired queryprocessor structure.

According to the invention, the overall structure of a query processormay be purely based on some basically intended design rules, i.e. arobot element must be assigned to a data source, a trigger must featurea manual user interface, a database element must contain retrieveddatabase element, etc.

Such a conceptual design of a query processor should preferably be madeby means of a graphically-based visual program, e.g. a drag anddrop-like design program.

Evidently, this conceptual programming of a query processor may be madeon the basis of more or less structured knowledge about the domain andthe data sources of the domain.

Basically, such a design of a query processor represents the frameworkfor the intended query processor.

The query processor elements basically represent differentsub-frameworks which may all be designed and performed in separatestructures or routines. Therefore, the design of query processors bymeans of different functional properties minimizes “error cross-talk”between the elements and the elements may advantageously be put togetherinitially without dealing with complicated details of the individualelements.

A query processor according to the invention is established foraccessing data of at least two different independent web-based datasources.

A further advantage of the above-mentioned method is that a break-downof the functional features of a query processor into standardizedelements, which may be configurable, may easily be conceived by aprogrammer.

A further advantage of the invention is that utilization of standardizedelements facilitates the possibility of pre-configuring differentvariants of a certain element type, thereby offering the possibility ofinserting a pre-configured element to the user.

An example of such pre-configuration of elements may e.g. be a triggerelement. Within the (type) group of trigger elements, several variantsmay be pre-established with great advantage if such trigger elements areutilized often. Therefore, a programmer may e.g. apply a trigger elementpredefined for trigging a query at certain time intervals. Other typesof trigger elements may e.g. be triggers comprising a statistic moduleapplicable for trigging a query according to different systemparameters. A third possible type of triggers may e.g. be a manuallyoperated trigger intended for establishment of a query in corporationwith a manually operated user interface.

Basically, the invention offers a high-level language facilitating easyweb-based access.

When said at least two predefined query processor elements havedifferent functional characteristics, an advantageous embodiment of theinvention has been obtained.

Different functional characteristics may e.g. be elements functioning asconverters, triggers, caches, robots.

Hence, a query processor according to the invention may be establishedby means of standardized “bricks”, thereby doing away with theestablishment of a web-oriented query processor being extremelycomplicated.

When modifying the selected query processor elements according to thedata structure of said web-based data sources, a further advantageousembodiment of the invention has been obtained.

According to the invention, the different elements may be configured ordesigned independently. Hence, the individual elements may beestablished so as to fit the individual task(s) of the elements withoutinducing errors somewhere else in the processing system.

When said modification of the selected query processor elementscomprises at least one plug-in software module, said at least oneplug-in defining domain-specific properties of said element, a furtheradvantageous embodiment of the invention has been obtained.

Hence, domain-specific plug-ins may initially be constructed, e.g.product catalogues, language dictionaries, as completely separateroutines. Moreover, the individual elements may be ideally constructed,e.g. a robot, with no or only little knowledge of the language of thedata source due to the fact that the basic structure and functioning ofthe robot is language independent. Product catalogues should likewise bedomain specific.

Moreover, the individual elements may be established with differentplug-ins.

Moreover, the invention relates to a method of establishing adomain-accessing routine,

said domain comprising a plurality of web-based data sources,

said method comprising the steps of

establishing at least one robot ( ) adapted for retrieving entitiesstored on said plurality of web-based data sources,

establishing at least one reference catalogue,

establishing at least one procedure of verifying the retrieved entitiesby comparing the read entities with the at least one referencecatalogue.

Thereby, an ideal way of retrieving information from a web-based datasource has been obtained.

When said method comprising the steps of

establishing at least one storage means

establishing a data-exchanging interface between said at least one robotand at least one storage means, a further advantageous embodiment of theinvention has been obtained.

When said reference catalogue is a product catalogue, a furtheradvantageous embodiment of the invention has been obtained.

When said established procedure of verification comprises a modificationof the retrieved entities if the verification procedure indicates orproves that a read entity is not valid according to the at least onereference catalogue, a further advantageous embodiment of the inventionhas been obtained.

Moreover, the invention relates to a query processor maintenance manager(QMM)

comprising at least one domain processor user interface (DPUI)

said manager (QMM) comprising means for evaluating different modules ofat least one query processor (QP),

said means for evaluating different subroutines of said query processorcomprising

-   -   means for monitoring the state of at least on query processor        element (QPE).

Hence, the query processor may comprise means for monitoring at therobot element, a transformer element, a trigger element, a mediator etc.

When said processor comprises means for automatically forwardingmessages to said at least one query processor user interface (DPUI) whencertain predefined conditions are met, a further advantageous embodimentof the invention has been obtained.

The predefined conditions may e.g. be conditions determining that atransformer has failed to transform extracted entities into conceptualentities.

A further predefined condition may be that a maximum load of an element,e.g. a cache or a robot, has been exceeded.

When said manager (QMM) comprises means for modifying individual queryprocessor elements/sub-routines, a further advantageous embodiment ofthe invention has been obtained.

The means for modifying individual query processor elements/sub-routinesmay e.g. comprise an editor for the robots or means for modifyingplug-ins centrally.

An example of such an editor may e.g. be the interface of a QueryProcessor Modeler in which the individual query processor elements maybe edited simply by clicking on the elements and thereby starting theeditor related to the activated element. Such an editor may e.g. be aRobotmaker, if a robot is clicked on, or a domain modeler if atransformer element is clicked on.

When said manager (QMM) comprises means for modifying the query flow inthe query processor during execution of the query processor, a furtheradvantageous embodiment of the invention has been obtained.

When allowing realtime editing in the query processor, the up-time ofthe query processor may be maximized. This realtime editor shouldpreferably comprise means for blocking differing query paths of thequery processor without invoking fault conditions on the associatedsignal paths.

An example of means for modifying the query flow may e.g. comprise amute element included in a query path. The activation of such a muteelement may then cause the involved branch to be out of work, whereasthe rest of the query processor may proceed unaffectedly, insofar thatqueries or entities (i.e. data) from the muted branch are significant toproceeding the query. Typically, the queries and entities missing fromone branch of the query processor subroutine may be preferable overclosing the complete query processor down.

Meanwhile, the elements of the muted branch, e.g. a robot or atransformer, may be “repaired” or updated without resulting in run-timeerrors.

A further advantageous variant of the above-mentioned modification maybe a halt routine acting as the above-mentioned mute but including amemory which may catch and store queries, and subsequently resumeprocessing by means of the cache and stored queries.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will be described below with reference to the drawings ofwhich

FIG. 1 illustrates some basic principles of a query processor system,

FIG. 2 illustrates a basic approach according to the invention whendealing with domain processing,

FIG. 3 illustrates the process of establishing a domain processoraccording to a preferred embodiment of the invention,

FIGS. 4 to 6 illustrate the principles of one embodiment of a domainmodeler according to one embodiment of the invention,

FIG. 7 illustrates the principles of an applicable robot-making programaccording to one embodiment of the invention.

FIG. 8 illustrates the functionality of a query processor modeleraccording to one embodiment of the invention and

FIG. 9 illustrates a possible user interface of a domain executionmanager.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 1 a illustrates the basic principles of a web-based market place.

A web-based market place generally comprises a number of web-based datasources DS. The data sources are e.g. web-sites associated with ahomepage of a data source owner. Typically, the data are transferredaccording to a HTTP protocol. Other protocols, e.g. WAP protocol orHTTPS are also applicable.

The data sources DS are typically a database or they are powered by adatabase DB of the data owner.

It should be noted that a marketplace may moreover comprise non-webbased data sources accessed by means of e.g. ODBC drivers.

The data sources offer information, products, services, etc. free or forsale.

According to the invention, a market place should technically deal withone domain only, but evidently, several domains may be overlaid andthereby offer a market place dealing with different domains.

An example of such domain may e.g. be a car market place. The cars ofthe domain are offered for sale on the individual web-based data sourcesDS, and the cars may be new or used. A domain may include differentnationalities of data sources and be in many languages. On the otherhand, a car market place offering used cars would typically onlycomprise cars offered for sale in one country.

Other exemplary domains may be jobs, services, stocks, odds, boats etc.

It should be noted that web-based access to the data sources facilitatesa very broad covering of the entire domain due to the fact thatweb-based data sources may be accessed without any kind of corporationbetween the accessing part and the data source owner. Typically, thedata sources will be independent.

According to the invention, the content of the data source of the domainwill be regarded as entities. An entity has different properties, heredefined as attributes.

An example of an entity is a specific car offered for sale, e.g. aPorsche, and attributes may be color, e.g. black, engine, e.g. 3.0liters, etc.

Another example of an entity is a specific boat described by a number ofsuitable boat-describing parameters, or attributes, such as length,price, year, etc.

When reading a web-based data source DS, a combination of attributeswill typically be read and interpreted as a car. Such reading ofattributes may be regarded as an extraction of information from theweb-based data source according to the invention.

The data sources DS may be accessed both by reading and/or writing.

The data sources may be accessed via a domain handling system, i.e. aprocessing system, implemented by software in hardware on theillustrated computer system CS. The computer system may comprise onecentral server or a number of coupled servers located centrally ordecentrally. Such system may be regarded as a query processor QP. Thequery processor is adapted for querying the data sources automaticallyor upon request, a query Q, made by a user U. The request is performedby means of a user interface implemented on a user platform UPF.

As illustrated in FIG. 1 b, a User Platform UPF typically comprises acomputer-based user interface which may be manually operated by a userU.

Hence, a user may forward a query Q to the data sources DS via the queryprocessor QP. The query may be processed in many steps and the queryprocessor QP may also include a data cache or a database for storingentities retrieved from the data sources DS for statistical purposes orfor speeding up the query process.

The individual web-based data sources are accessed (i.e.: read and/orwrite) by means of robots attached to the data sources. Typically, onerobot is uniquely to a corresponding data source DS.

The definition of a robot differs significantly from the somewhatpopular definitions and the more scientific definitions.

The definition adapted in this application is that a robot is a kind ofautomatic process established with the purpose of accessing web-baseddata. A robot is a sub-arrangement of a so-called agent.

According to the invention, a robot is a software-based automaticprocess established with the purpose of accessing web-based datasources. According to the invention, a robot may even comprise some kindof intelligence embedded in the process establishing elements. It shouldbe noted that a robot according to this definition may even be regardedas an agent by some practitioners within the art.

According to the invention, the agent has no personality, and it is notautonomous, nor mobile, in the sense that the agent is free to betransferred and processed on the local data source servers of the datasource owners. A robot according to the invention is established forremote execution in relation to the data sources to be accessed and therobots will only be executed in a particular server environment. Itshould be noted that this particular environment may obviously includeseveral servers located at different places.

Again, it should be noted that non-web-based data sources may be addedif desired.

FIG. 1 c illustrates the complex nature of a data source to be accessedaccording to the invention. The illustrated data source DS has a datastructure which is initially unrevealed and incompatible with the accesstools of the retrieving profile associated with the specific data sourceDS.

According to the illustrated embodiment, the character-based informationof the data source DS has been converted into a number of attributes ofidentified text strings. Evidently, attributes may be encoded anddecoded in various formats such as character based formats, image basedformats and active content formats, such as Java applet, JavaScriptapplication or VB script application.

The text strings may e.g. be a mix of text strings identifying carnames, model names, numbers, etc.

Subsequently, the data source must be evaluated and interpretedaccording to an extraction model in order to facilitate access to hiddeninformation by the retrieving profile RP.

FIG. 1 d illustrates identification and categorization of attributes ofa data source according to the invention.

The attributes, i.e. the text strings of the data source, maysubsequently be interpreted and combined into so-called entities ofassociated attributes ASA. The associated attributes may be establishedso as to comprise certain predefined types of attributes, i.e.categorized attributes.

An example of an entity is a car entity comprising the categorizedattributes CA “Trabant”, '88 and $100,000 where the first attribute ofthe category is car model, the second attribute of the category ismanufacturing year and the third attribute of the category is the price.The above-mentioned entity may also be referred to as an instance of anextraction model. The extraction model defines and describes certainattributes and entities of interest for the domain.

Each entity is established as a set of associated attributes ASA and theirrelevant attributes are filtered away.

Evidently, the establishment of entities of associated attributes may beperformed in several different ways, and more or less automatically,within the scope of the invention. It should be noted that the preferredembodiment of the invention implies a completely automatic establishmentof as many robots as possible.

A detailed description of a semi-automatic robot establishment accordingto one embodiment of the invention is described with reference to FIGS.7 to 9.

Subsequently, the identified entities may be copied into the centraldatabase DB means in such a way that the retrieving profile initiallyperforms a query in the database instead of visiting every involved datasource DS and lists the results to the user according to a predefinedlisting format. This feature ensures quick access to the search result.If the user U requires additional information, this information may beobtained by means of a link contained in the above-mentioned resultlist.

When the entities have been copied to the database and associated withthe retrieving profile, further information is added to the retrievingprofile in the form of a robot adapted to the data structure of thespecific data source. This robot is associated with the retrievingprofile in order to visit the data source according to certain triggercriteria and to reevaluate the data source in order determine whetherthe contents of the data source have been changed. Hence, the robot willaccess the data source e.g. at certain intervals and update the contentsof the database if changes have occurred. Such an automatically handledchange may take place if e.g. one entity has been removed from the datasource and replaced by two other entities when the removed entityrepresents a sold car and the two new entities represent cars introducedfor sale.

Such a change observed by the robot should of course be reflected in thedatabase, as the sold car has to be removed and the two cars be added tothe database in order to reflect the state of the data source when thedata source is visited.

A change may likewise be stored and registered for statistic purposes inanother database.

If, on the other hand, the data structure of the data source has changedin such a way that the robot is no longer able to extract the correctinformation, an error is reported to the retrieving profile. Such anerror results in the establishment of a new robot fitting the newstructure of the data source.

It should be noted that each data source typically requires a dedicatedrobot.

FIG. 2 illustrates three entity models applied in a preferred embodimentof the invention.

The three entity models are an extraction model EM, a conceptual modelCM and a storage model STM.

For reasons of simplicity, entities according to the three models arereferred to as extraction entities EENT, conceptual entities CENT andstorage entities SENT. The entities are also referred to in threedifferent formats, i.e. an extraction format, a conceptual format and astorage format.

The entity flow is transformed between the different formats by means ofconverters established for converting the data from one format intoanother. According to the invention, the converters may preferably beestablished as so-called transformer elements which will be dealt within detail below.

Starting from the web-based data source end, upstream, the entities areaccessed according to an extraction model preferably common for allinvolved data sources of the domain. The extraction entities simplycomprise a serial stream of strings. According to the extraction model,the strings are ordered in such a way that the receiver of thestring-stream may recognize what the transmitter actually intends totransmit. This may be established both with accompanying codes or simplyas a convention defining the sequence.

In fact, the extraction model represents more than a data format. Italso defines the different attributes which the robots should accesswhen dealing with the different data sources. In other words, theextraction model represents a framework in which the designers maydesign the robots. The robot designers may therefore concentrate fullyon designing a robot capable of accessing the attributes contained inthe extraction model and on combining the attributes into entitiesaccording to the extraction model, i.e. extraction entities.

The extraction entities may nevertheless be established e.g. wholly orpartly by automated extraction routines. In a certain web-based datasource, such routines may e.g. be adapted for automatic reading the datasource representation, automatic recognition of attribute patterns ofthe web-based data source, and outputting of these attributes asextraction entities according to the extraction model.

Moreover, such automated routines may evidently be adapted for assigningthe specifically discovered attribute/entity patterns of a data sourceto a corresponding robot.

According to the preferred embodiment of the invention, the extractionmodel may be established by means of a domain modeler DMR.

The extraction entities may then be converted, e.g. by a transformer,into conceptual entities. Among other things, the conceptual modelrepresentation of an entity involves a conversion of the individualentity into a unique object. In a simplified manner, an extractionentity comprising a string stream of “Porsche”, “Red”, “3.0”, “Diesel”is converted into a unique car object, a conceptual entity, being aPorsche which is red and with a 3.0 liter diesel engine.

The conceptual format moreover offers the possibility of handling theentities in a compact way. Now, the entities may be represented in anobject-oriented manner instead of a flat string format.

Moreover, a conceptual approach to the entities offers the possibilityof adding knowledge to the retrieved entities. Such information may e.g.be information deducible from a reference product catalogue. Thus, if anentity is matched with an entity type of the product catalogue, theentity may be modified, e.g. as a validation, a correction or as aninsertion of additional information about the entity.

A correction may e.g. be that one of the attributes of the Porscheretrieved above is false according to the product catalogue. This falseattribute may be detected in several different ways within the scope ofthe invention. The reference product catalogue may e.g. initially revealthat no Porsche having a 3.0 liter engine has been made with a dieselengine. Moreover, the product catalogue may reveal that no Porsche hasbeen made with a diesel engine, thereby raising the probability that thedata source provider has made a mistake. The wrong attribute “Diesel”may then be corrected.

Insertion of added information may e.g. be that the recognition of aPorsche of the above-mentioned type (now assuming that the dieselstatement has not been made) has electronic injection. This informationmay then be inserted as a new attribute to the unique conceptual entityPorsche or in the fill-in of a text field attribute of the Porsche.

Validation comprises the step of evaluating whether the currentlyinvestigated conceptual entity should be regarded as a valid entity atall. Such validation may basically result in the fact that the entity isaccepted as a valid entity or that the entity is discarded.Subsequently, a valid entity may be further processed with the purposeof deducing information about the entity described above.

A discarded entity may result in a further investigation of the originaldata source with the purpose of evaluating whether an entity has beenoverlooked. Evidently, a realtime evaluation of the discard rate of eachdata source should be performed with the purpose of monitoring whetherthe robot or the extraction model associated with the individual datasource needs an update or replacement.

Typically, every possible attribute of a conceptual entity should bepredefined in the conceptual model. According to a preferred embodimentof the invention, the conceptual entities and attributes should beestablished by means of a domain modeler.

The conceptual model should typically be made by people having a certainkind of knowledge about the domain. It should, nevertheless, beemphasized that the establishment of relevant attributes may be heavilysupported by automated procedures traversing trough the domain andidentifying the offered combinations of attributes.

The last entity model is the storage model. The storage model isprimarily adapted for applying traditional database structures anddatabase handling methods to the retrieved entities. Thus, the modelingof a storage model may be performed with very little knowledge of thenature of the domain but more or less by focussing on the involvedattributes and entities.

Evidently, other entity format approaches may be applied within thescope of the invention. Specifically, the distinction between thedifferent models may be softened up a little in the sense that theconceptual model and the data storage model may more or less beincorporated in one body.

Evidently, the invention features the possibility of performingcentralized processing when data retrieved from the different datasources are represented according to a generalized entity model, e.g. aconceptual model.

The extraction format may be understood as an analogue format while theconceptual/storage format may be regarded as a digital format.

The extraction entities are typically entities extracted directly fromthe web-based data sources, the conceptual entities are typically theentities flowing in the heart of the query processor capable of morecomplex processing, and the storage entities are typically the entitiesrepresented in e.g. a relational database.

It should be emphasized that the different models, e.g. theabove-mentioned extraction model EM, conceptual model CM and storagemodel STM may facilitate an entity flow both ways; downstream asdescribed above from the data sources to the user querying the queryprocessor, or upstream from a user submitting an entity or a request,e.g., an order to a certain data source.

If, for instance, a user wants to buy an item found in the domain, hemay then submit an order associated with a chosen entity, e.g. a PC,car, etc. This order would comprise the selected item as a storage orconceptual entity which is subsequently converted in the query processorand submitted to the relevant data source according to the extractionmodel. An extraction model according to the invention may thus both bedefined as a way of reading the data source and it may be defined as away of writing (submitting) entities into the data source, e.g. by meansof a form into a shopping cart of the data source or a data search formassociated with the relevant data source.

Preferably, the two functions, reading and writing, should be supportedby two separate distinct models for the purpose of clarity, i.e. onemodel for reading the data source, an extraction model, and one modelfor writing to a data source, a submission model.

The first format, the extraction format, is the format in which theentities are accessed in the web-based data source. This format isevidently a little fragile and unhandy due to the fact that thisstring-based entity stream is primarily based on transmission of datasupposed to be entities and attributes of entities. This fragileextraction format may typically not be supported significantly byvalidity checks due to the fact that the extracted entities aredifficult to process on a large scale. Such processing would involvemajor complex string-based processing.

The conceptual format is established on the basis of the predefinedconceptual model defining the basic nature of the entities of thedomain. The conceptual representation may fundamentally be regarded asan object-oriented representation of the read entities. A conceptualrepresentation of the read entities is relatively easy to process in thesense that the entities are converted into unique instances of theconceptual model, thereby offering filtering, conversion or modificationof any information related to the individual instances of predefinedinformation, e.g. attributes, types of attributes etc. consistent withthe conceptual model.

The storage format is basically intended for storing the retrievedentities for later access. The storage format represents a more handyrepresentation of the retrieved entities of the domain in the sense thatsuperfluous information, e.g. information contained in or related to theconceptual model may be omitted. Such information may e.g. be entityinformation utilized for converting the extraction entities intoconceptual entities. Such information need no longer be present in thestorage model as the entities are now conceived as unique entities.

The entities stored in a database according to the storage model may(and should) instead be used for statistical purposes.

The conceptual model and the storage model may be more or lessoverlapping but, preferably, these formats should be dealt withseparately, thereby obtaining the possibility of reusing the storagemodel and even the conceptual model in other applications. Moreover, thestrict separation between the applied data models facilitate theindividual models to be modified individually without consideringinteraction with the other models under some circumstances. An exampleof such a simple modification of a model is the modification of aclassification module which may basically be established without anymodification of other modules as long as no new entity attributes havebeen introduced or removed.

A part of the extraction model may be global or at least multiple in thesense that this part of the model may contain general plug-ins of theextraction model applicable for many or all data sources to be accessed.An example of such general plug-ins may e.g. be a language dictionarydefining different applicable languages, e.g. English, Japanese, Frenchor Danish. Moreover, the language dictionary may contain adomain-specific dictionary focussing on the entities characterizing thedomain.

FIG. 3 illustrates the process of establishing a domain processoraccording to a preferred embodiment of the invention.

It should be noted that the establishment of the components andlogistics needed for collecting data from a domain and the maintenanceof the components may be performed in other ways within the scope of theinvention.

Initially, the main steps to be introduced below with reference to FIG.3 will be described shortly. A throughout discussion of the steps andthe meaning of these steps will made below with reference to thesubsequent figures.

Initially, it has been decided that a new domain must be established.This domain may e.g. be a domain comprising boats offered for sale whichare either used or new.

The boats are offered for sale from different web-based market places,typically the homepage of a dealer or e.g. private homepages.

As discussed later, web-based data sources may be supplemented by e.g.direct reading in a dealer's database, e.g. by means of ODBC basedreading. Nevertheless, the domain should basically always be located inat least two different web-based data sources.

Moreover, the web-based data source may typically be accessed withoutthe consent or knowledge of the web-based data source owner.Consequently, there are no strict sign-up requirements by the datasource owner. Therefore, the data fundament of the domain is huge,insofar it more or less includes all entities offered for sale in thecomplete worldwide web.

The decision that a new domain ND has to be made initially invokes theDomain modeler DMR to establish the characteristics of the domain. Thesecharacteristics are to be used when establishing the different technicalmeasures needed for accessing the web-based data sources. Details of thefunctioning of the very important Domain Modeler DMR will be discussedlater. It should be noted that the domain modeler may operate more orless automatically.

According to a preferred embodiment of the invention, the domain modelerDMR outputs a specific Domain model DM needed for the different softwaremodules, also named elements, to be used when establishing the queryprocessor for the domain. Hence, the elements described at a later pointmay advantageously utilize the domain model DM for different oroverlapping purposes. The domain model DM may comprise a knowledge basedescribing different general features and aspects of the invention so tospeak. Such a general knowledge “container” benefits from the fact thatthe knowledge describing the domain may be established centrally andthereby obtain a compact knowledge structure which may be modifiedcentrally and basically without dealing with complicated details of thedifferent query processor elements. Therefore, the domain modelrepresents a knowledge structure that may be accessed by the differentquery processor elements simply by defining a so-called plug-in to theindividual or some of the query processor elements. The plug-in mayrepresent a domain reading structure, e.g. JAVA-code, adapted forreading a certain part of the domain suitable for the establishment andfunctioning of the element. Therefore, different elements may utilizedifferent parts of the knowledge. Moreover, the centrally organizedknowledge may be modified centrally, thereby inferring that all elementsautomatically utilize an updated knowledge base with little or typicallyno modification of the elements or the plug-ins.

According to the invention, some general knowledge may evidently bedecentralized, i.e. put into the individual query processor elements.However, according to a preferred embodiment of the invention, thecentral knowledge base, or the domain model DM, should be maximized.

A domain model DM may e.g. comprise a reference product cataloguedescribing all known products of the domain, e.g. a list of differentknown car models and variants of such models.

Furthermore, the domain model DM may comprise mappings between differententity models applied by the query processor, e.g. conversion mappingsbetween extraction entities, conceptual entities and storage entities.

Furthermore, the domain model may e.g. comprise the extraction,conceptual and storage models.

Also, the domain model may comprise language dictionaries, bothdomain-specific and more general dictionaries.

By applying a domain model, a change in the domain model may bereflected uniformly in the complete query processor.

The next step, Create Query Processor CQP, initiates the combination ofdifferent elements by means of a Query Processor Modeler QPM. Some ofthe elements combined by the Query processor Modeler QPM are establishedby the domain modeler DMR and some of the components are generalpreestablished elements. Other elements to be used may e.g. be robotsintended for accessing the data of the individual sites.

The next step, Create Accessors CA, initiates the assignment ofindividual robots to specific data sources of the domain. A detaileddescription of such a robot-generating program may be found inPCT/DK00/00163 and PCT/DK00/00429 filed by the applicant and is herebyincorporated by reference.

The last step, Maintenance, involves the establishment of differentprocedures intended for maintaining the query processor. Such proceduresmay e.g. be establishment of a robot and system monitoring. Suchmonitoring may e.g. include the monitoring of the load of the softwareelements/modules and whether the robots actually fit the sites, etc.

Moreover, such procedures may include modifying or exchanging robots ifsuch actions are considered necessary.

Evidently, the chronology of the above-mentioned steps may be modifiedwithin the scope of the invention, e.g. by establishing the robotsbefore the query processor is combined in the Create Query Processorstep.

FIGS. 4 to 6 illustrate the principles of a domain modeler according toone embodiment of the invention.

Evidently, the user interface providing the domain modeling features tothe user may be established in numerous variants within the scope of theinvention.

According to the illustrated embodiment, the relations between the tableof the database are made in a selectable “edit” environment. Evidently,a combined view/edit environment is applicable within the scope of theinvention.

The illustrated domain modeler comprises an interface having a menu barcomprising four different selectable menus File, Edit, View and Mapping.

FIG. 4 a illustrates that the menu View has been selected. The Viewmenu, which is a Relationships Window, may comprise several menu items:Storage model, Extraction model, Conceptual model and Submission model.The models define the different entity models adapted by the completequery processor. Evidently, different kinds of entity models anddefinitions of entity models may be adapted within the scope of theinvention.

The term database model may also be referred to as a storage model.

In FIG. 4 a the Database model view has been selected.

The view area VA appearing when selecting Storage Model View illustratesthe basic components of the database attached to the domain by means ofvisual indications of relations between the tables. The database modeldefines the structure of a database intended for storage and handling ofthe entities of the domain. A database model is typically a relationaldatabase rather than a flat-file database in order to accommodate theknowledge obtained by the query processor.

The Relationships window may be in different “show relationships”—modes,e.g. “Show All Relationships” or “Show Direct Relationships”.

The first mode shows all tables of the current database. The other modeshows the tables of the database within the currently selected domain.When selecting the available tables, the viewer will show therelationships to all tables related directly to the selected table.

Basically, this viewing area VA may operate like known visualizing toolsadapted for viewing relations between tables of relational databases.

According to the illustrated embodiment, the viewer is in the secondmode. An open domain model intended for attachment to a PC distributingdomain comprises a PC Equipment table PCE. The illustrated PCE tablecomprises an ID, DealerID, ProdID and Price. The first is a primary keyto the PCE-table, while DealerID and ProdID are foreign keys to thetables DCAT and PCAT, respectively.

The PCE table refers to a product catalogue PCAT and a dealer'scatalogue DCAT. The product catalogue PCAT is a table of the productsattached to the domain and intended for sale. The dealer's catalogueDCAT is a table of the dealers attached to the domain. Finally, the PCEtable refers to price.

Evidently, such a PCE table would typically be more complex, e.g.comprising relations of tables comprising further productcharacteristics such as color, comments to the products, currency, URLetc.

When double-clicking on the Price field of the PCE table, the Pricefield definitions appear as a dialogue box PD. This field may be appliedfor defining the Price field. The illustrated Price field has the name“Price” and the field type may be selected as a string or an integer,here selected as an integer.

FIG. 4 b illustrates that the menu Mapping has been selected. TheMapping menu, which is a table or Relationships Window, may compriseseveral menu items, e.g. the illustrated EM to CM, CM to STM, STM to CMor CM to SM.

The first-mentioned mappings, EM to CM and CM to STM, deal with mappingsneeded for retrieval of entities from a data source, while the twolatter deal with writing, i.e. submission to a data source (e.g.filling-in of a form in a data source to place an order, filling-in of asearch form or e.g. insertion of a new entity in the data source.

The EM to CM, Extraction model to Conceptual model mapping, defines themapping between the entities and/or attributes retrieved according tothe extraction model EM into entities and/or attributes according to aconceptual model CM.

The CM to STM, Conceptual model mapping, defines the mapping between theentities and/or attributes held according to the conceptual model CMinto entities and/or attributes according to a storage model STM.

The STM to CM, Storage model to Conceptual model mapping, defines themapping between the entities and/or attributes represented according tothe storage model STM into entities and/or attributes according toconceptual model CM.

The CM to SM, Conceptual model to Submission model mapping, defines themapping between the entities and/or attributes represented according tothe conceptual model CM into entities and/or attributes according to asubmission model SM.

Evidently, the mapping from one model to another may be performed inseveral other ways than the table-based method illustrated in FIG. 4 bwithin the scope of the invention.

Thus, the mapping may include direct transformation of a number ofassociated attributes into a unique object in a relational manner. Thatis; the bundle of associated extractions is transformed as a whole intoone unique object instead of applying the above-mentioned method ofinitially mapping the extraction attributes into conceptual attributes,and then subsequently establish a unique entity on the basis of areference system, e.g. a product catalog defining different possibleentities of the domain.

The mapping from the extraction model to the conceptual model preferablyinvolves a classifier (i.e. a classification system) that will mapextracted entities into conceptual entities according to a productcatalogue. That is; the product catalogue may contain various (generic)conceptual entities existing in the domain. After classification, if aclassifier is at all available in the domain, the conceptual entitiesare made unique according to the extracted entities by transferringvarious attribute values from the extracted entities to the conceptualentities, such as price, URL, currency etc. This transfer of values fromextraction entities to conceptual entities is done by selecting andconfiguring a transfer function that maps one or more extraction modelattribute values into one or more conceptual model attribute values.

In FIG. 4 b, the EM to CM has been selected.

The view area appearing when selecting EM to CM attributes illustratesthe attributes to be converted into conceptual entities, e.g. in theform of a table.

In FIG. 4 b, the extraction attribute “Make” has been selected, therebyopening a mapping table where EM-CA A has been selected. The tablecomprises different applicable mappings between extraction attributes toconceptual attributes, here exemplified by the strings Ferrari, Fiat andFord converted into integers 17, 18 and 19, respectively.

FIG. 5 illustrates that the PCE table has been double-clicked. A PCEdialogue box appears PCED. This dialogue box facilitates editing of thePCE table defining data, e.g. by insertion of SQL-statements associatedwith the PCE table, attribute names, etc. Finally, the table may begenerated by selecting the Table Generate tag, TAG.

Basically, the storage model may be modeled by known prior artdatabase-generating tools. The important thing when dealing with thedatabase model for the specific domain is to include all necessaryattributes and establish an well-structured, easily searchable andquickly accessible database. It should be noted that this structuring ofthe domain database may be performed independently of the rest of thedomain query processor, as long as the necessary entity attributes havebeen defined.

FIG. 6 illustrates the Domain Modelers Extraction model viewer.

In FIG. 6, the Domain Modelers Extraction model viewer has beenselected.

While the database in the database modeler viewer may be regarded as therepresentation of entities “understood” by the query processor, thedomain extraction model to be made by the extraction modeler may beregarded as the definition of relevant attributes included in the syntaxof “raw” string-based data of the web-based data sources to be accessedas defined by the data source provider.

Robotmaker

FIG. 7 illustrates the principles of an applicable robot-establishingprogram according to one embodiment of the invention.

Evidently, the robots to be used in the query processor may beestablished and attached to a certain data source in many ways withinthe scope of the invention.

The main principles of the robot generator mentioned below is to make arobot and assign it to a certain site containing data relevant to thedomain of interest, i.e. assign the robot to the site by means of anaddress, e.g. URL address, and generate a data reader (the robot)capable of reading the data of interest contained in the data source,e.g. a web-site, and transfer these data in a certain data format to thecentral control of a query processor in response to a query.

Hence, according to a preferred embodiment of the invention, a new andunique robot has to be made for each web-based data source to bequeried.

Turning now to FIG. 7, a short overview of this program will bedescribed.

A detailed description of such a robot-generating program may be foundin PCT/DK00/00163 and PCT/DK00/00429 filed by the applicant and ishereby incorporated by reference.

The nodes may be arranged in straight-forward paths. However, the nodesare typically arranged in branched IF-THEN paths.

The robot generating program is adapted for establishing sequentialaccess of a web-based data source. The control of this sequentialreading is e.g. established by means of a graphical path of nodeprocessors NP, each node processor NP performing some configurableprocessing of its input. The nodes are sequenced in such a manner that aweb-based data source, e.g. in HTML, may be traversed and data extractedor submitted. It should be noted that high-volume establishment of suchrobots is somewhat time-consuming. Hence, the robot-generating programsshould be very user friendly or even automatic.

A nodeprocessor selector NPS is adapted for configuration to the currentapplication in the node processor configuration view NPC. Moreover, thenodeprocessor may be attached to a certain document area by means of adocument range definer DRD.

Finally, the robot maker viewer comprises a document view which e.g. maybe adapted for viewing the XML text of the data source or a part of thedata source.

Basically, the robot maker outputs robots and each robot is specializedin operating one dedicated web-based data source.

According to the preferred embodiment of the invention, the robotoutputs entities according to the extraction model(s), i.e.non-classified or interpreted data, to a central control, e.g. to atransformer query processor element. Here, the extracted strings may beconverted into coded representations, e.g. as objects stored in adatabase, and the extracted data may then be classified.

Evidently, according to additional/other embodiments of the invention,the established robots may contain transforming means for transformationof extracted data into a conceptual representation, e.g. conversion of asequence of strings “Ford”, “2.0”, “red” into an object stored in adatabase as a “car”, which is a red Ford having a 2.0 liter engine. Itshould be noted that the preferred embodiments of the invention benefitfrom a more central transformation of entities into conceptual data,thereby reducing the requirements of maintaining decentral transformers.

Query Processor Modeler

A query processor modeler according to the invention is intended forestablishment of the “transfer function” between the user, the web dataaccessing machine and the data located in a web-based data source. Themeaning of “transfer function” involves a data flow from the usertowards the data accessing machine and/or the web-based data sources.Moreover, the transfer function involves control of the flow of datafrom web-based data sources towards the web-data extraction machineand/or the user.

According to a preferred embodiment of the invention, this functionalityis referred to as a query process flow and the established “accessingmachine” is referred to as a query processor. The query processor willpreferably be adapted for processing of a certain well-defined domain,e.g. a car domain. It should be noted that some kind of overlappingbetween the domains may be acceptable in the sense that one queryprocessor may e.g. comprise query processor elements accessing data fromdifferent domains. Preferably, the domains should be separated since aquery processor should only deal with one domain.

The query processor will be defined in a query process graph below bymeans of a visual programming tool.

FIG. 8 illustrates a preferred embodiment of the invention involving avisual programming tool for establishing the above-mentioned transferfunction by means of a query processor graph QPG.

According to a preferred embodiment of the invention, the queryprocessor modeler comprises a visual and programmable editor. Theillustrated editor facilitates the combination of a number of QueryProcessor Elements QPE into a query processor graph. The query processorelements may be of different types defined by their main functions.

Initially, a short introduction of query processor elements will beprovided.

An example of a query processor element QPE may e.g. be a robot, such asa robot query processor element RQPE. A robot query processor elementRQPE is adapted for accessing web-based data sources upon request. Asingle robot may typically be attached to one single data source.

Evidently, a robot query processor element may also be adapted forreading only or writing only if suitable.

Another example of a query processor element QPE may e.g. be a cache,such as a cache processor element CQPE. Such an element is adapted forreturning a response to a query or it may guide the query further on inthe process if the cache contains no answer to the query. A furtherpossibility is that the cache element CQPE returns a part of theresponse which may be established by means of the entities alreadycontained in the cache, and forward a query further upstream in theprocessor in order to establish the rest of the response.

A further example of a query processor element QPE may e.g. be aso-called mediator query processor element MQPE. This element is adaptedfor distributing an incoming query to other query processor elements andfor gathering the response returned by these queried processor elements,e.g. robots, and returning the answer back to the processor whichqueried the mediator MQPE.

Another query processor element may be of a trigger type, i.e. a triggerprocessor element TPE adapter, for triggering a certain operation or aquery.

The trigger processor element TQPE is adapted for initiating a certainaction, e.g. an automatically scheduled initiation of a query, anautomatic trigger processor element ATPE. Another applicable triggerprocessor element TPE may e.g. be a trigger adapted for initiation of aquery upon request by a user, i.e. a manually activated trigger MTPE. Itshould be noted that the latter trigger processors represent anothertype of query processor elements than the first. The trigger queryprocessor element is not activated by an incoming query but at its owninitiative. Hence, a manually operated trigger element MTPE may beregarded as an element including a user.

Turning now to FIG. 8, the figure illustrates a query processor adaptedfor processing a certain domain. According to the illustratedembodiment, the domain comprises three web-based data sources. Theillustrated query processor QP is constructed and monitored by means ofa visually programmed drag- and drop query processor graph QPG. Theestablishment of this query processor graph may also include theconfiguration of the individual query processor elements. Theconfiguration of e.g. a robot may thus be performed by means of anembedded robot modeler which may be activated via the Query ProcessorModeler.

The illustrated query processor graph comprises three robot queryprocessor elements RQPE1, RQPE2 and RQPE3.

Each robot is attached to a specific, dedicated data source, i.e.determined by the URL of the data source. Each robot is made automaticor semi-automatic by means of a robot modeler RM, both referred to asrobot maker and robot modeler RM in this application. The robots RQPE1,RQPE2 and RQPE3 are adapted for accessing, i.e. reading and/or writing,the associated data source (not shown) according to a read/write patterndefined and associated with the individual robots. This definedread/write pattern enables each robot to access the corresponding datasource. According to a preferred embodiment of the invention, there is aone-to-one relationship between the robots and the data sources, i.e.one web-based data source is accessed by one robot only. The read/writepattern in the robot is typically highly specialized in order to fit thespecific data structure of the associated data source. It should benoted that web-based data structures are typically programmed andstructured independently, e.g. in HTML tables or other more or lessunforeseeable data structures.

The establishment of a read/write pattern may also be referred to as acreation of a robot.

Evidently, the invention offers different web-based data source ownersthe possibility of entering their data in a data structure which is easyto access by the query processor. Such easy access may e.g. be providedto the data source owners in the form of design requirements if theywant their data source to be roboted. Likewise, the query processor mayalso include data-accessing robots, e.g. by featuring direct ODBC accessto the database of the data owner. Thus, it will sometimes be possibleto assign a standard robot type to such generalized data source if sodesired.

According to a preferred embodiment of the invention, requirements tothe data source owner will be kept low, thereby offering the possibilityof accessing numerous different data sources.

Turning now to the defined robot query processor element RQPE1, thisrobot is dedicated to a specific web-based data source and communicateswith a query processor element in the form of a cache CQPE1. The cachemay be activated by a trigger TQPE1. This trigger element TQPE1 mayinitiate a certain trigger-defined query subsequently performed by therobot query processor element RQPE1.

The cache element CQPE1 may e.g. be provided as an encapsulation of therobot's data source. This direct and local pre-cache operation on onedata source provides the possibility of reducing access time to certaindata of the data source operated by the robot RQPE1. Evidently, thisfacility is attractive for the purpose of bootstrapping the cache withentities (data of the data structure of the data source) that are oftenqueried. The trigger element TQPE1 should typically ensure that dataoften queried are updated regularly according a preferred embodiment ofthe invention in order to avoid a completely empty cache. Evidently,this control may also be integrated in the cache CQPE1 within the scopeof the invention. The cache CQPE1 is a coupled mediator query processorelement MQPE1. The functioning of the mediator MQPE1 will be describedbelow. Moreover, the cache element CQPE1 may e.g. be adapted with thepurpose of reducing the load on the specific site roboted by the robotelement RQPE1 in a more strict sense, as the cache may be adapted forreturning entities stored in the cache without querying the robotirrespective of the fact that the entities stored in the cache are notcompletely updated. Thus, the local cache element CQPE1 may thus set aminimum interval for activation of the robot RQPE1, thereby ensuringthat each and every query not does necessarily result in a query of thedata source. This application of a cache may ensure that a certain siteis not overloaded by the robot.

A further robot query processor element RQPE2 is dedicated to a specificweb-based data source and communicates with a query processor elementsin the form of a transformer TAQPE1. The transformer element TAQPE1 isadapted for receiving a query from a user-activated query element MPTElocated downstream to the located data sources located upstream. Theillustrated transformer element TAQPE1 channels an unmodified queryfurther on to the robot query processor element RQPE2. Subsequently,when the robot RQPE2 returns a reply to the query, the response may bemodified by the transformer before being returned to the connectedmediator MQPE1. Such a modification may e.g. be established as a trivialmapping of km: 34 to be read as km: 34,000 or the like. Preferably,utilization of transformers for such purposes should be made whencertain data sources, e.g. web-site, use certain terms deviating fromthe general terms applied by other data source providers within thedomain.

The system comprises a further robot query processor element RQPE3dedicated to a specific web-based data source. This robot RQPE3 isdirectly coupled to the mediator MQPE1.

The mediator MQPE1 is applied for branching the query process path intoseveral different paths, e.g. three as illustrated. During the returnpath, the mediator collects the information obtained by the queriedrobot branches and returns the data to a transformer element TAQPE2.

This transformer element TAQPE2 defines a principle borderline betweenthe upstream robots RQPE1, RQPE2 and RQPE3 and the downstream user U asthe transformer performs a transformation of data retrieved by therobots into conceptual data according to a conceptual model associatedwith each robot. These conceptual data are handed over from thetransformer element TAQPE2 to a cache query processor element CQPE2.Typically, the conceptual model should be common for all involvedelements dealing with entities in a conceptual manner.

The cache element CQPE2 may be regarded as the main storage means forthe query processor QP intended for storage of the currently updatedentities retrieved by the robots of the query processor.

The nature of the cache may vary significantly from application toapplication. In some applications, the cache may comprise only recentlyentered conceptual data, while caches in other applications may comprisea more or less complete database of the entities comprised in the datasources associated with the domain processor.

The cache CQPE2 may be activated by a trigger query processor TQPE2.

This trigger may e.g. be adapted for refreshing the cache CQPE2according to scheduled trigger criteria. The trigger criteria may bothbe established on the basis of user query statistics and/or statisticsassociated with data stored in the cache CQPE2.

The data contained in the cache CQPE2 are conceptual data.

The cache CQPE2 are coupled to a user interface represented by amanually operated trigger element MTPE located downstream of the queryprocessor graph via a tracking module TMO adapted for gathering andstoring data. The gathered data are used for keeping track of thehistory of data contained in the data sources of the domain and forestablishing and maintaining query statistics. This tracking module is acombination of a number of query processor elements QPE.

Basically, the module comprises a storing query processor element SQPE1adapted for writing data into a database query processor element DBPE1.The database DBPE1 comprises entities retrieved from the associateddomain of data sources and the entities are stored according to apreferred storage model. The storage may also contain history-describingdata or data from which the entities may be deduced. The storing queryprocessor element SQPE1 may be activated by both a user query or atrigger query TQPE3. The trigger query processor element TQPE3 isintended to maintain and establish desired data, such as prices of carsor the like and thereby offer the possibility of registering if anentity comprised in a data source covered by the domain processor hasoffered another price etc.

Finally, the illustrated query processor path comprises a transformerelement TAQPE3. This transformer element is primarily responsible fortransforming conceptual data into storage data in the database DBPE1.

Short explanations of some of the above-mentioned query processorelements will be provided below.

Generally, according to a preferred embodiment of the invention, thequery processor elements should function without any knowledge of thecontext.

The Cache Query Processor Element

A cache query processor element according to the invention mayimplemented in many ways. Generally, the cache should (as a traditionalcache) contain some of the entities recently read from one or some ofthe data sources. The idea of applying a cache should generally be thatof reducing access time to the data sources. Generally, the cache may becontrolled in many ways, depending on the purpose. Thus, the cache maybe activated from time to time by an automatic trigger with the purposeof refreshing the content of the cache with respect to certain types ofentities. Triggering of the cache would then imply that the triggeredcache forwards a query to the relevant data sources of the domain,collects the response and writes the returned entities into the memory.Obviously, triggering of the cache may be constructed in numerous wayswithin the scope of the invention as long as the main purpose of thetriggering is to obtain the best possible performance of the currentapplication. Evidently, in some domains, the cache should not be appliedfor entities exceeding a certain age, e.g. 3 minutes, if the nature ofthe entities contained in the domain are changing quite often.

An example of advantageous triggering according to the invention maye.g. be that of triggering the cache with the purpose of refreshing thecache with entities often queried by the users of the query processor.This boot-strapping ensures that start-up time is reduced by maintainingthe often queried entities in the cache. The statistical control maytherefore imply triggering of the cache which may vary dynamically, i.e.be controlled by the user request.

A further possible approach may e.g. be triggering of the whole domainonce a day which means that all relevant data contained in all datasources of the domain are read into the cache and that all data areupdated at least once a day. Evidently, according to the latterstrategy, the cache is controlled in a manner resembling a kind ofpersistent database.

The Transformer Query Processor Element

The transformer query processor element is basically an element whichmay transform an incoming query or entity to another query or entity.Hence, the transformer works both ways: downstream and upstream.

Applicable transformer elements may e.g. be transformers transformingraw extracted text-string entities received from upstream (e.g. from arobot) into entities in a conceptual representation of the entities readfrom the data-source according to a preferred embodiment of theinvention.

Further possible transformer elements may e.g. be a transformerreceiving conceptual entities and outputting the entities according to adata storage model.

A further, and more simple transformer, may e.g. be a mute transformerelement, arranged in front of a robot or in a certain branch. This mutemay be adapted for blocking the entity or query stream in the respectivebranch. Such a mute transformer may e.g. be advantageous if a certainrobot must receive maintenance, thereby offering the possibility to anoperator of maintaining a query processor to modify or exchange acertain robot without modifying the query process graph. Hence, a robotmay be maintained without simultaneously receiving a stream of queries.It should be noted that the transformers may by arranged in manydifferent positions in the query graph within the scope of theinvention.

Trigger Query Processor Element

The trigger query processor element comprises means e.g. for invoking aquery in an element associated with the trigger. The trigger may thencomprise a schedule adapted for defining fixed time intervals whichdetermine when to query the associated element, e.g. a cache. Likewise,the trigger may comprise calculation algorithms adapted for calculatingsuitable trigger conditions, e.g. when to query, and/or how to query.Therefore, the trigger may advantageously comprise statisticalevaluation means.

Mediator Query Processor Element

A mediator query processor element MQPE is adapted for distributing anincoming query to other query processor elements and for gathering theresponse returned by these queried processor elements, e.g. robots, andreturning the answer back to the processor which initially queried themediator MQPE.

Hence, the mediator may show several different levels of intelligence,from the somewhat simple and uncomplicated branch element simplydistributing an incoming query to a number branching elements, to quiteintelligent elements capable of distributing an incoming query to thebranches most likely comprising the queried entities.

A mediator may deal with data according to any representation, e.g.conceptual entities, storage entities or extraction entities.

Messenger Query Processor Element

Other possible types of query processor elements to be included in thequery processor graph may e.g. be MESQPE Messenger query processelements. The messenger elements MESQPE are adapted for monitoring theprocess of the individual QPE's or between the QPE's. These messengersmay e.g. be adapted for returning a processor's state-describingparameters to an operator responsible for the query processor or thequery processor element. Messengers may e.g. be adapted for providingstatistical material or fault warnings.

It should be noted that the conceptual building of the domain processormay be performed in many different ways. This means that the word“element” and the word “graph” should in no way restrict the scope ofthe invention in the sense that the wording primarily reflects thefunctional understanding of the elements. Evidently, other types ofelements may be derived within the scope of the invention, e.g. elementscombined on the basis of the above-mentioned elements. Examples of suchpossible derivatives within the scope of the invention may e.g. be arobot processor comprising a transformer (i.e. the robots readextraction entities, transform the data to conceptual entities, andreturn the entities to a central control, e.g. a database; e.g. a cachecomprising a transformer, e.g. cache comprising a trigger, etc.)

A further advantageous messenger may e.g. be a messenger adapted forraising a flag to the operator managing the query processor when theentities to be transformed into conceptual data are not contained in areference product catalogue, thereby offering the operator thepossibility of updating such a catalogue locally or globally.

Other advantageous elements may e.g. be elements directly adapted forreading a well-known database, i.e. by means of ODBC drivers, therebymaking it possible for extracted reading of “foreign” web-based datasources to be supplemented by readings from few or several databasescomprising entities included by the domain.

According to the invention, each of the present elements may beactivated by clicking on the element in the editor, therebyinitiating/activating the element-creating application. Hence, theRobotMaker application will be activated by double-clicking on aselected robot, e.g. RQPE1, and the Domain Modeler will be activatedwhen double-clicking on e.g. the transformer TAQPE2.

When the query processor graph QPG has been established, the graph maybe saved, thereby maintaining the properties of the complete queryprocessor QP.

The structure and functioning of the individual query processor elementsare defined by means of the domain modeler DMR and the Robotmaker RM.Evidently, some of the query processor elements are domain independentin the sense that they may be included in the query processor graph ofseveral different types of query processors DP, e.g. trigger processorelements with little or no modification, whereas other query processorare somewhat domain specific. An example of a domain independent queryprocessor element may e.g. be the aforementioned mute transformerelement which may be applied by any desired domain withoutpre-modification.

It should be noted that the Query Processor Modeler may even, andpreferably, include query processor execution tools included in theillustrated “view” setup. Such a setup may include the illustrated viewwhich, when in run mode, illustrates the running state of the queryprocessor and the individual elements. An example of such intuitiveprocessing is that the individual elements change color according to thestate, e.g. within a color range from white to red, depending on theload of the elements.

Moreover, the interface, e.g. the illustrated view, should preferablyvisually illustrate basic on-off conditions, i.e. illustrate actively ifan element is working properly, and whether entities are transferredbetween the query processor elements and preferably whether entities mayactually be transferred between elements. The latter feature may easeoperation of the system significantly due to the fact that the absenceof an entity flow between the elements does necessarily indicate that afault-condition has occurred simply because the element is not queried.

Determination of a “clear road” between the elements may e.g. beestablished by forwarding dummy (testing) queries between elements atcertain intervals.

Moreover, the Query Processor Modeler may include submenus facilitatingspecialized execution of the query processor. Such a submenu isillustrated in FIG. 9, and it may e.g. be selected by the “run” dropdown menu of the Query Processor Modeler.

Moreover, the Query Processor Modeler may feature specializedvisualization of certain groups of query processor elements. Thus, a“robot element” viewer may be activated, thereby offering the operatorthe possibility to concentrate fully on his task, e.g. maintenance ordesign of robot elements and thereby ignore elements dealt with by otheroperators.

It should be noted that a query processor according to the invention mayeasily comprise several hundreds of robots.

Likewise, other designers may advantageously activate a “no robot view”while designing the main body of the query processor.

It should also be noted that the above-mentioned examples of elementsmay be combined into groups of macro-elements, e.g. of a robot elementcomprising a transformer, etc.

FIG. 9 illustrates a possible user interface of a domain processor DP. Adomain processor is adapted for supporting maintenance of one or severalquery processors QP when established.

The illustrated user interface of a domain processor comprises atree-based structure monitoring area. One domain processor may controlexecution and maintenance of several different domains.

This area monitors a first level of node-represented servers NL1. Thislevel illustrates different servers applied, WebServer, RobotServer1,RobotServer2. A second node level NL2 shows the current domainscontrolled by the domain server, e.g. Cars, Yachts and PC's. A thirdlevel NL3 illustrates different selectable query processorstate-indicating functions, e.g. queries, triggers and messages. Thefunction Messages has been selected in the illustrated view.

It should be noted that the term server referred to in level 1 NL1 mayboth reflect a physical location of a query processor with respect to aserver, or it may refer to a kind of virtual server comprising severaldifferent servers, each processing their part (e.g. element or groups ofelements) of the query processor.

Moreover, the illustrated viewer comprises a message viewing area MVAadapted for viewing messages forwarded automatically by e.g. differentunique elements of a query process path or groups of elements. Theattributes of listed messages may e.g. be chosen as the illustratedTitle, Date, Priority, Origin Element.

The viewer may moreover facilitate a filtering of the individualelements of the original element. Hence, an operator may e.g. establisha filtering of messages from a certain element, Original Element, or ofgroups of elements, e.g. mediators or transformers.

Moreover, the viewer comprises a message detail window MDW. This viewermay illustrate details about a single message or groups of selectedmessages in the messages view area MVA. Each message may e.g. beassociated with a startup-facility with the purpose of activating theeditor or editors associated with the individual message.

A query element program, e.g. a robot editor, may be started directlyfrom the domain processor DP, e.g. by automatically importing the datafrom an element selected in the viewer such as a specific robot.

1. A Query Processor Modeler, comprising: means for selecting at leasttwo Query Processor elements from a set of predefined query processorelements; means for attaching at least one selected robot queryprocessor element to a dedicated data source of the data sources of thedomain; means for combining at least two of the selected Query Processorelements; and means for executing said associated query processorelements on at least one computer system; and at least one of said queryprocessor elements of the associated query processor elements being aRobot Query Processor Element adapted for accessing a dedicated webbased data source, wherein the Query Processor Modeler further comprisesa graphical user interface in a form of a visual programming tool meansfor customizing the at least one individual robot query processorelement to the corresponding attached data sources; means forcustomizing at least one trigger query processor element to the queryprocessor; and means for storing said combination of the selected QueryProcessor elements on a storage medium.
 2. A Query Processor Modeler,comprising: means for selecting at least two Query Processor elementsfrom a set of predefined query processor elements; means for attachingat least one selected robot query processor element to a dedicated datasource of the data sources of the domain; means for combining at leasttwo of the selected Query Processor elements; and means for executingsaid associated query processor elements on at least one computersystem; and at least one of said query processor elements of theassociated query processor elements being a Robot Query ProcessorElement adapted for accessing a dedicated web-based data source, whereinat least one of said of query processor elements of the associated queryprocessor elements, is a trigger query processor element means forcustomizing the at least one individual robot query processor element tothe corresponding attached data sources; means for customizing the atleast one trigger query processor element to the query processor; andmeans for storing said combination of the selected Query Processorelements on a storage medium.
 3. A Query processor, comprising: a set ofweb-based data sources, wherein at least two of said data sourcescomprise entities according to a domain model; and at least three queryprocessor elements, at least two of said query processor elementscomprising a robot, said robot being attached to a dedicated datasource, said robot comprising means for accessing information from theat least one data source according to at least one extraction modelassociated with said robot, at least one of said query processorelements comprising a trigger query processor element, said triggerquery processor element comprising means for establishing a query, or atleast one of the query processor elements comprises a transformer queryprocessor element, a messenger query processor element or a mediatorquery processor element means for selecting at least two Query Processorelements from a set of predefined query processor elements; means forattaching at least one selected robot query processor element to adedicated data source of the data sources of the domain; and means forcombining the selected query processor elements into a query processorby means of a graphical user interface wherein said graphical userinterface defines a query processor element path visually on a drag anddrop basis; and means for storing said combination of the selected QueryProcessor elements on a storage medium.
 4. A Method of establishing atleast one query processor, said query processor comprising a set ofweb-based data sources, wherein at least two of said data sourcescomprise entities according to a domain model, said query processorcomprising at least three query processor elements, at least two of saidquery processor elements comprising a robot, said robot comprising meansfor accessing information from the a dedicated data source according toat least one extraction model associated with said robot, at least oneof said query processor elements comprising a trigger, said triggerquery processor element comprising means for establishing a query, saidmethod comprising: selecting at least two Query Processor elements froma set of predefined query processor elements; attaching at least oneselected robot query processor element to the dedicated data sources ofthe domain; combining the selected query processor elements into a queryprocessor by means of a graphical user interface wherein said graphicaluser interface defines a query processor element path visually on a dragand drop basis customizing the at least one individual robot queryprocessor element to the corresponding attached data sources;customizing the at least one trigger query processor element to thequery processor; and storing said combination of the selected QueryProcessor elements on a storage medium.
 5. A Method of establishing atleast one query processor, said query processor comprising a set ofweb-based data sources, wherein at least two of said data sourcescomprise entities according to a domain model, said query processorcomprising at least three query processor elements, at least two of saidquery processor elements comprising a robot, said robot comprising meansfor accessing information from the a dedicated data source according toat least one extraction model associated with said robot, at least oneof said query processor elements comprising a trigger, said triggerquery processor element comprising means for establishing a query, saidmethod comprising: selecting at least two Query Processor elements froma set of predefined query processor elements; attaching at least oneselected robot query processor element to the dedicated data sources ofthe domain; combining the selected query processor elements into a queryprocessor by means of a graphical user interface, wherein at least oneof the combined query processor elements comprises a transformer queryprocessor element, a messenger query processor element, or a mediatorquery processor element; customizing the at least one individual robotquery processor element to the corresponding attached data sources;customizing the at least one trigger query processor element to thequery processor.
 6. Method of establishing at least one query processor,said query processor comprising means for accessing data from web-baseddata sources of a domain by means at least one user interface, saidmethod comprising: selecting a number of query processor elements, atleast one of said selected query processor elements being a robot queryprocessor element, at least one of said selected query processorelements being a trigger query processor element; attaching at least oneselected robot query processor element to a dedicated data source of thedata sources of the domain; combining the selected query processorelements into at least one query path defining the data flow in thequery processor between the user interface and the web-based datasources of the domain, wherein at least one of the combined queryprocessor elements comprises a transformer query processor element, amessenger query processor element, or a mediator query processorelement; customizing the at least one individual robot query processorelement to the corresponding attached data sources; customizing the atleast one trigger query processor element to the query processor; andstoring said combination of the selected Query Processor elements on astorage medium.